Learning in non-stationary Partially Observable Markov Decision Processes

نویسندگان

  • Robin JAULMES
  • Joelle PINEAU
  • Doina PRECUP
چکیده

We study the problem of finding an optimal policy for a Partially Observable Markov Decision Process (POMDP) when the model is not perfectly known and may change over time. We present the algorithm MEDUSA+, which incrementally improves a POMDP model using selected queries, while still optimizing the reward. Empirical results show the response of the algorithm to changes in the parameters of a model: the changes are learned quickly and the agent still accumulates high reward throughout the process.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Stationary Temporal Probabilistic Networks

The paper describes a method for learning representations of partially observable Markov decision processes in the form of temporal probabilistic networks, which can subsequently be used by robotic agents for action planning and policy determination. A solution is provided to the problem of enforcing stationarity of the learned Markov model. Several preliminary experiments are described that co...

متن کامل

Solving Hidden-Semi-Markov-Mode Markov Decision Problems

Hidden-Mode Markov Decision Processes (HM-MDPs) were proposed to represent sequential decision-making problems in non-stationary environments that evolve according to a Markov chain. We introduce in this paper Hidden-Semi-Markov-Mode Markov Decision Processes (HS3MDPs), a generalization of HM-MDPs to the more realistic case of non-stationary environments evolving according to a semi-Markov chai...

متن کامل

Geometry and Determinism of Optimal Stationary Control in Partially Observable Markov Decision Processes

It is well known that any finite state Markov decision process (MDP) has a deterministic memoryless policy that maximizes the discounted longterm expected reward. Hence for such MDPs the optimal control problem can be solved over the set of memoryless deterministic policies. In the case of partially observable Markov decision processes (POMDPs), where there is uncertainty about the world state,...

متن کامل

Hidden-Mode Markov Decision Processes

Samuel P. M. Choi Dit-Yan Yeung Nevin L. Zhang [email protected] [email protected] [email protected] Department of Computer Science, Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong Abstract Traditional reinforcement learning (RL) assumes that environment dynamics do not change over time (i.e., stationary). This assumption, however, is not realistic in many real-...

متن کامل

Good Policies for Partially-observable Markov Decision Processes Are Hard to Nd

Optimal policy computation in nite-horizon Markov decision processes is a classical problem in optimization with lots of pratical applications. For stationary policies and innnite horizon it is known to be solvable in polynomial time by linear programming, whereas for nite-horizon it is a longstanding open problem. We consider this problem for a slightly generalized model, namely partially-obse...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005